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ABSTRACT 


A method of cluster analysis is presented in Which 
points in n-dimensional space are analyzed through a sub- 
divisive procedure. The points are orthogonally projected 
onto that line which maximizes their variance and the 
resulting point distribution is then analyzed with the use 
of a histogram. Wherever possible, divisions between con- 
glomerates of points are made and each separate clump is 
subsequently analyzed. Ultimately adjacent groups are 
Combined and analyzed through an analogous technique in an 
effort to re-unite any points which may have inadvertently 


deviated from the group with which they truly associate. 


The method is later refined to allow the detection of groups 


in several point dispersions which would have appeared as a 


Single conglomeration under the original method. An example 


is given to illustrate the applicability of the procedure. 
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ft. SEERODUCTION 


A. EXPLANATION OF THE PROBLEM 

It is) frequently necessary to classify or distinguish 
between events or objects. The course of one's everyday 
life involves numerous automatic and casual classifications. 
For example, without an acute conscious awareness it is 
possible to discriminate between man and woman, bees and 
birds, or various colors. In effect, an entity is charac- 
terized through the outcome of the observations of several 
elassaireation characteristhes or variates. Inm»cases where 
Ehe classadfiication is less obvious, each of these n 
classification variates may be treated as a dimension in 
n-dimensional space. The individual specimens on which the 
Observations are being made may then be conceived of as 
points or n-dimensional vectors plotted on a multi- 
dimensional coordinate system whose axes are the scales of 
the specimen characteristics. Thus, every subject can be 
represented by one point in space. 

Cluster analysis is a method of estimating the true 
groupings of these points. If one were observing character- 
istics of a sample from k different populations, pj, P5: 


eo Dia PD he would then want to divide 


‘1 aie) ate Rl 
n-dimensional space into k mutually exclusive and collec- 
tively exhaustive regions, Eq, Upesse eR) La Seen ry. 
with the rule or procedure of assigning an individual to 


Peete Detongs tO ri. In other words, cluster anaPyors 





circumscribes the k areas of high point density in 
hyperspace with multi-dimensional boundaries which define 
divisions between the k populations. 

A procedure which could accomplish this has wide 
appliedbility. For example, the dispersion of human chromo- 
somes as they appear on a photograph of mitosis involves a 
two-dimensional array of points; the dispersion of stars 
involves a three-dimensional array: bacterial classifica- 
tion could involve an n-dimensional array as would any 
taxonomical problem in which n characters of m_ species 
are treated as variates and represented by m points in 
iK—cinetcaeman Space. Initormation on the true grouping o£ 
the species could be obtained from the point grouping as 
Seen titweugn cluster analysis. Thus, most investigations 
into the classifications of objects or dispersion of things 


require the elucidation of spatial groupings. 


B. GENERAL 

There are two distinct methods of attacking the cluster 
analysis problem. The first of these is the agglomerative 
method which starts with a single point as the "group" and 
adds sso, it othems which satisfy certaim criteria, A group 
is formed when no new points can be added to the existing 
conglomeration; another starting point is then chosen and 
the process iterates to determine the second group. This 
Gechneeguc 1s contanmued until akl points anesallocated to one 
Of the k groups. Thus, groups are built up from the 


individual points and in this manner n-dimensional space is 





Se tacdeimto k mutually exclusiwe amd collectively 
exhaustive regions. The other method of approach employs 
Subdivisive techniques which arrive at an ultimate partition 
Cran =-dimnensional space through an initial partition, or 
possibly several initial partitions, of n-dimensional space 


and subsequent divisions of each of those partial spaces. 


Ge ~6CHHI STORY 

The area of cluster analysis has received ever increas- 
ing attention over the last four decades by those interested 
in classification methods. Four of the more renowned pro- 
cedures are briefly reviewed; if it is desired to learn more 
concerning any of these methods the list of references will 
be helpful in directing the reader to sources of greater 
detail. In 1933, Hotelling devised a principle component 
analysis method of grouping points which involved the 
successive elimination of dimensions. Points are projected 
orthogonally from a.multi-dimensional space into a space of 
fewer dimensions which retains "maximum information." Said 
differently, those characteristics having the least amount 
Of variability among their observations are eliminated and 
Ciewspece reduced. This process is continued until ‘ehe 
points are finally in an observable space. 

The weighted mean pair method developed by Sokal and 
Michener (1958) and altered by Rogers (1959) has been 
applied to entomological problems. The procedure, again 
updated in 1963 by Sokal and Sneath, involves operations 


performed with an initially constructed m by m_ symmetric 
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matrix, where m is the number of points. The resemblance 
between two specimens is the proportion of the number of 
Characteristics measured which they have in common. Thus, 
the larger this similarity ratio, the more alike are the 
specimens. Then a distance, aij. between any two points, 


cj and cj, i, j = i, 2,---m, is defined as djj = -log Sijr 
where Si; is the similarity ratio between points Ci and 
Cj. It is interesting to notice that as Si; approaches 1, 
qi; approaches O. Thus, the symmetric similarity matrix 


(dj 5) is in effect a mileage chart in semi-metric space, 


Showing distance between any two points c. and c 


an Overall number, H. 


ar is computed for each point c 


1! 

where Hj; = > Ajj. } =. 1,0 2,20em, and 1:= J. Merrett 
j 

Pesseseing the smallest H; value is designated as the 

prime node and becomes the first member of the group. 

Without loss of generality it can be labeled Ci: The 


Botte welesest tO Cy), C is then added to the group if its 


i! 
Similarity coefficient satisfies certain specified criteria. 
Meth Chis, the first and ith yows and columns of the 
matrix are deleted, one new row and column representing the 
combination of the similarity coefficients of cj, and c; 
added, and the process repeated for the n-1l order matrix. 
This method continues until no new points have similarity 
coefficients which satisfy the group entry criterions. Then 
a smallest H;, value is computed for those points which 


were not admitted to the group and the process iterates to 


form multi-groupings until all points are allocated to a 


GasOUp « 





In 1959, Williams and Lambert concerned themselves with 
the case of an nxXm data matrix X, where n is the 
number of points and m is the number of variates, consist- 
ing entirely of presence or absence data (1 or O respec- 
tively). Originally they applied their method to taxonomic 
problems in ecology, where the variates were the different 
plant characteristics present or absent in m individuals. 
The m points are divided into two subsets on the basis of 
the variate k which "best" separates them (in a well- 
defined sense): One set being those that contain k, the 
Other being those that do not. If it is feasible to divide 
the groups with respect to any of the remaining variates, 
the appropriate dimension is chosen and the process con- 
tinued. 

Edwards and Cavalli-Sforza (1965) developed an accurate 
but highly tedious subdivisive procedure. Noting that the 
best division between clusters would be that which resulted 
in the two clusters being as dense as possible, they proceed 
to separate the points into two groups through every possi- 
ble division of points. They then choose that division 
which maximizes the between group variance and minimizes 
the within group variance. The procedure is repeated for 
each group with a weighting factor, the associated between 
Clusters sum of squares, to describe the importance of each 
division. Note that the drawback to this method lies in the 
computational labor of examining all splits. The authors 


admit that (n-1) 2 2N-1ll seconds are required on a computer 





with a five microsecond access time; that is, twenty-one 
points require 100 hours and forty-one points require 54,000 
years. 

Presented in the following pages is a method of cluster 
analysis. An example is given to show how the procedure is 


of assistance in analyzing problems of a taxonomical nature. 
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IL. PROCEDURE 


A. BACKGROUND 

The procedure presented in this paper employs subdivi- 
Sive techniques. The method involves the orthogonal 
projection of points in n-dimensional space onto a one- 
dimensional line and an analysis of the resulting point 
dastribution. In order to facilitate the group detection 
process, it seems desirable to determine that line which 
provides maximum separation between the group projections. 
The difficulty in doing this emanates from not knowing which 
points associate to form groups. A second impediment is 
that the meaning of maximum separation between groups is not 
clear. The concept of maximizing separation between groups 
is subject to several different interpretations, two of 
which follow: the line providing maximum separations 
between groups is obtained when (1) the sum of the distance 
between the closest points of all adjacent groups is a maxi- 
mum or (2) any two adjacent groups have a maximum separa- 
tion, that is, the separation between the projections of two 
adjacent groups is greater than the separation between the 
projections of any other adjacent groups on any line. 


Figure 1 may be helpful in clarifying this distinction: 


Paicieiscmal 





a 


> <a 





The line on which the sum of the distances between the 
closest points of all adjacent group projections is a maxi- 
mum is ie That is, the sum of the distance between 
groups one and two plus that between groups two and three is 
greatest when projected on line ee It thus corresponds 
to the first interpretation. The maximum separation 
between the projections of two groups, ry and Tope sy pEeS 
Vided by he. There is no line on which the projections of 
any two adjacent groups are separated by a distance greater 
than that between groups r}, and fry on line Ag: Thus, 

( 2 provides maximum separation with respect to the second 
interpretation. 

Inherent to subdivisive methods is the possibility of 
erring by assigning a point to a group with which it does 
not conglomerate. It is, therefore, desirable to make divi- 
Sions between groups in a manner which minimizes the proba- 
bility of introducing such error. Thus, in every case it is 
advantageous to work with lines analogous to ie because it 
provides a maximum amount of separation between two high 
density point areas which are to be divided into two groups. 
In this manner, it reduces the probability of erring by 
assigning points to the wrong group. However, certain point 
dispersions which result in well-defined groups enhance the 
utility of lines analogous to f , which has the advantage of 
dictating more divisions per iteration. 

One line which maximizes the separation between groups 


under the first interpretation is that line which maximizes 


a2 





the between group variance and minimizes the within group 
variance. However, inadequate knowledge concerning those 
vectors which conglomerate or which points associate to 

form groups makes this method non-functional computation- 
ally; in this form it involves the analysis of all possible 
divisions of points into groups and the subsequent projec- 
tion of each combination onto the most suitable line. The 
computational labor of such an analysis rivals that of the 


method of Edwards and Cavalli-Sforza mentioned earlier. 


B. THE APPROPRIATE LINE 

The line which is used is not as discriminating as that 
which maximizes the between group variance and minimizes 
the within group variance, but is computationally more 
accessible. The basic analytical tool is chosen to be that 
one-dimensional ray on which the variance of the ortho- 
gonally projected points is a maximum. Several point dis- 
Papacions can be formed for which that line which maximizes 
the variance of the projected points does not coincide with 
or even lie near that line which maximizes the separation 
between groups. However, it does appear to be a reasonable 
line in that in most cases it provides a point distribution 
which is very auspicious to group detection in later analy- 
sis. The problems arising when this line is not favorable 
are reckoned with later. The problem of finding a line 
which maximizes the variance of the orthogonally projected 
points can be formulated as a non-linear programming 


problem. The objective function is to maximize the variance 


ig 





of the projected points on a line given by the direction of 
i, a Unit vector. 


Hoes orob lem is: 
m 
pe. —, 2 
Max Sv = >7 (x; - x) 
i i=1 


Subject to: utu = a 


where: 
ae = sample variance 
m = number of points 
n 
sau 
where: 
n = the number of dimensions or variables 
t _ - ; 
u = (u,, Unreses u.) = unit Veeror 
C55 = (Cy, Cinreecs Cl COOrCd Tae omer: 
th ; ; . 
ee a point an n dimensions 
m mM n 
= Uo _ a 
e => “4 -2>— >, ig ¥)- 


od 
Il 

bt 
Wo 
lI 

toe 


— 


The problem now becomes: 


2.1. » mi es } 
Max S” = me] ( Cij a) = Cij a5) 


subject to utu = 1. 


The Lagrangian is then: 


ERQUGs, «eee ee s? - \(utu-1) 
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2S 


which implies 


Ee ae a) go = AS. a -1) ; 


and substituting in the expression for s¢ gives 


Mm n 
a 7S 
m aes n 
& z ‘3 “ij os) [P= (> uj) 2) 
ee) Jel 


Setting the partial derivative of the Lagrangian with 
respect to each variable equal to zero gives necessary con- 
ditions from which the optimal vector, u, can be found. 


For the qth component of wu: 


i=l 


m 
Letting Cy, - = Cox = Bix and rewriting gives: 


‘<1 


L5 





Distributing the first summation over the terms: 


m n 
2 
O = any > | aie a; | Bin 7 
i-1 j-1 


Rearranging within each texzm: 





m n m 
2 Bix eek i, 
= Ciuc eue = 
m-l 13 J m (m-1) 
i=l jai: i=l 


m n 
eae 25 - 2 u,. 
i=l j=l 


The second term is identically zero since 


mM 


m m 
>, Bix = >, ( ci - Vm > | Cin ) 
i=l i=l 


‘i=l 
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which shows that 


mM 2 Bik ; taact | 
— mx i O ae 
, - m(m-1) 





i=l 
Therefore, 
m ial 
| 2 2B. 
F 
ee = 0 = Is Cee Soya A u 
Aes m~1 13 J k » 
=) yell 
2 Bix | 
Setting = Dix yields 


m 

1 — _ = 

(1) Qo > Diy oe u; 2 y Uy, 4 h ly, wee he as 
i=l 


a necessary condition. Differentiating the Lagrangian with 
respect to the Lagrange multiplier, A , and setting the 


derivative equal to zero yields the constraint 
2 22 
Iu, te. tu = 1. 


Thus, u is found through the solution of a system of ntl 
equations in n+l unknowns, Uj, Ug9,.--, Upy- THESE Faheumaa 
equations of the form (1) and there is one equation of 

form (2). It is important to notice that setting the first 
partial derivatives of the Lagrangian equal to zero is a 
necessary but not a sufficient condition for a maximum. In 
fact, on rare occasions when the rank of the Jacobian of 


the constraint equations is less than m, the number of 


LEY 





constraints, these conditions are not even necessary. That 
is, for a set ae constraints possessing a Singular Jacobian, 
there may exist solutions which maximize or minimize the 
objective functions which will not be found using the first 
Partial derivative of the Lagrangian technique. This will 
be ignored here, however. Since there is no guarantee that 
a solution to the above system of equations maximizes the 
Pa ackiive function, it is imperative that all solutions be 
obtained. The unit vector which maximizes the variance of 
the projected points will be one of those solutions and can 
be identified as the maximum through its substitution into 


the objective function. 


C. UNDERLYING THEORY 

Before delving further into the details of the procedure, 
one assumption can be made from which two important observa- 
tions follow; it is hypothesized that the n-dimensional 
hypervolume spanned by the points of any group is nearly a 
hyperellipsoid whose lengths along the axes are real 
numbers greater than or equal to zero. This assumption 
seems consistent with expectations since a hyperellipsoid 
is the shape of the hypervolume that the points will most 
likely span. Furthermore, it imposes virtually no constraint 
Since it permits the spatial dispersion of points of any 
group to generate a multi-dimensional hypervolume of many 
forms, from a hyperball (solid hypersphere) to a straight 


line segment. 


une 





The assumption that the hypervolume spanned by the 
points is a hyperellipsoid results in the first observation 
—--that a curve-fitted histogram plot of the distribution of 
points of any group projected on a line has a bell-like 
structure Similar to that of the normal density; further- 
meso, ENC eCLLeCCt is especially inherent to groups of “igh 
point density. This phenomenon can be reasoned as follows: 
New, will represent the line onto which the points are to be 
Orthogonally projected and the two-dimensional case will be 
considered. A sufficient number of equidistant parallel 
lines, Wis Woreees Wy. 
bound and divide the entire group. The value of j would 


can be constructed which completely 


depend upon the distance between lines. In Figure 2 below, 


it can be observed that the expected number of points of the 


iad Ws Nl Wo We 


Figure 2 





group contained between wg and wo is greater than that 
contained between w3 and wy, which in turn is greater 
than that between w, and wy. Hence, the curve-fitted 
tmeetegran Gal be expected to exhibit a bell=Like strucetae 
Pig@ee 3 indicates that the orientation of the group meen 
respect to the line is free to vary without altering this 
result. Now generalizing this two-dimensional analysis 


for the n-dimensional case, yeh can again represent 


L9 





Figure 3 


| 
| 





f 

the line onto which the points are to be projected. Then, 
as before, it is possible to construct a sufficient number 
of equidistant parallel (n-1)-dimensional hyperplanes, 
Ws Wor coe, Wye to completely bound and divide the hyper- 
ellipsoid. Then, intuitively, the expected number of points 
contained between two adjacent hyperplanes passing through a 
"thick" portion of the ellipsoid (near its geometric center) 
is greater than that contained between two hyperplanes in a 
"thinner" portion. Hence, the curve-fitted histogram would 
possess a bell-like structure, an effect which is more 
Salient for increasing density of points. An interesting 
extension of this observation is that the normal-like shape 
becomes more pronounced as the hyper-ellipsoid tends to a 
hyperball and less pronounced as it elongates; in fact, the 
bell-like structure is not discernible for a hyperellipsoid 
in the form of a straight line. The concept of ascertaining 
information concerning the group shape through the analysis 
of point distributions on the line will later be shown to be 
Of value. 

The second observation follows from the first: a histo- 
gram of the projections of two or more groups superimposed 


On the line will vary among bell-like, skewed bell, flattened 
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bell, or any combination of bell-like forms. This observa- 
tion is intuitively appealing and a few illustrations will 
show that it is indeed logical. The histogram can be 


expected to contain local maxima and minima. 


| V) O | Or Figure 4 





i 
The ideas presented in this section will be used as 
keystones in those portions of the procedure concerned with 


analysis of histograms. 


D. THE ANALYSIS OF PROJECTED POINTS 

As stated earlier, the procedure being developed involves 
the projection of points on a line and the subsequent exami- 
nation of the distribution of projected points. The disper- 
Sion of points on the line is analyzed through the use of a 
histogram. Divisions are made between maxima and each 
segment again analyzed through the use of a new line. Ulti- 
mately, adjacent groups are combined and analyzed to correct 
the possible error that too many divisions were made. 

Assuming that the line which maximizes the variance of 
the points has been found using the non-linear programming 
method shown previously, the next step is to form a histo- 
Gram. Fundamental to its construction is a decision 
concerning the interval length between the equidistant 
parallel (n-1)-dimensional hyperplanes. It has been deter- 


mined that, in most cases, dividing the line length spanned 
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by the two end points into m/2 equal intervals, where m is 
the number of points under consideration, appears to be a 
reasonable starting interval length. Then if this does not 
suffice, the interval length yielding more well-defined 
Maxima can be determined through experimentation if desired. 
Assuming the efforts in histogram construction are 
successful, its analysis can now be undertaken. If the plot 
is unimodal this phase of the procedure is terminated and 
it can be concluded, for the moment, that all those points 
comprise one group. As a matter of definition, a plot is 
considered to be unimodal if there do not exist two 
Separate peaks with a point between them less than eight- 
tenths (.8) the height of the lesser peak, having zero slope 
and positive second derivative. Eight-tenths is chosen 
through the following reasoning: it is advantageous to 
divide the space into a maximum number of regions to gain 
information concerning the spatial grouping of points. As 
the number of groups analyzed simultaneously decreases, the 
accuracy and informational yield of the analysis increases. 
The fact that groups will later be combined alleviates any 
fear of error emanating from splitting conglomerates of 
points which are actually members of the same fraternity. 
Thus, it is beneficial to the discrimination process to 
impose and exercise a rather ruthless division rule, that is, 
one which dictates divisions readily. On the other hand, an 
expected result of the non-uniformity of point density is 


that the histogram of a group contains slight dips and rises 


Ze 





superimposed on an overall bell-like form. Therefore, the 
ruthlessness in division must be reduced to account for a 
jagged histogram form. Thus, the cut-off is arbitrarily set 
to be a dip of twenty percent (20%) the height of the lesser 
peak. The following illustrations may be helpful in clari- 


fying this concept: 


Figure 5 





In the above situation, the value of the zero slope point 
between the two local maxima is sufficiently low to allow a 
partition of the space into two regions. The second curve 


permits a three-way partition of n-dimensional space. 
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After determining the number of maxima or groups the 
histrogram portrays, the points between any two adjacent 
peaks must be separated between them. The decision rule for 
this is to make a division between the maxima at the point 
Bia ere the cumve-titted histogram takes on Zero sill@pe and 
has a positive second derivative as indicated in Figures 5 
through 7. Then all of those points to the left of C; are 
allocated to the maximum on the left and those a the right 
Of c; to the maximum on the right. Note that this alloca- 
tion is made with knowledge that for those cases in which 
the value of the histogram at the zero slope, positive 
second derivative point deviates from zero, i.e., for over- 
lapping projections of groups on the line, the method will 
Cause some points to be allocated to a group with which they 
ao not eluster. This inaccuracy is accepted for the present 
because of the possibility that in the subsequent analysis, 
those points allocated to the inappropriate group will be 
split off and later correctly combined. 

Then, assuming all of the points have been allocated 
among several maxima, that is, n-dimensional space has been 
partitioned into several mutually exclusive and collectively 
exhaustive regions, the process is continued for each of the 
regions separately. That is, the entire procedure for 
finding the best line and analyZing the distribution of 
points is iterated for each of the regions individually. 
This is continued for each subregion until none of the 


regions will subdivide further, or in other words, until 
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Were Gees mot exist a histogram of any of the regions which 
contains more than one maximum. A summary of this divisive 
Procedure for a typical distribution of points cowld be 


represented schematically as follows: 


th 
pee tse oes * 7 


fatale he Fo Fame, 


Figure 8 


In Figure 8 the procedure has divided n-dimensional 
Space into seven regions. The original maximum variance 
line contained three distinct maxima and n-dimensional space 


Wes partitioned into three regions, + Yo, yore! ie oF Ly 


Ter 
feet 65. The maximum variance line for r, + Yo and 

et rq showed two maxima and that for By a ry + re 4. ro 
contained three. Regions Lys Lor Lar Vygs Yor Ven ro were 
unable to further divide because each had unimodal histo- 
grams of their individual dispersion of points when projected 
on the maximum variance line. The result is that n- 


dimensional space has been partitioned into seven mutually 


exclusive and collectively exhaustive regions. 
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tee CORRECTING THE ERROR 

As noted previously, in the process of separating points 
between peaks based solely on the zero slope, positive 
second derivative criteria, points may have, been split from 
group r,; with which they truly cluster and included in 
group fj. If this phenomenon has occurred, there then 
exist two possible routes these points may have followed: 
(1) in later analysis of histograms for group rj; or any of 
its subgroups, the foreign points were observed to cluster 
and form a separate peak; they were then split away from the 
remaining points of x and formed a separate group or 
(2) due to insufficient clustering, no new maxima containing 
these points were formed and they were not split from group 
Ly or any of its subgroups. A procedure for resolving the 
problem generated by the second alternative has not been 
eeve lohea and must be absorbed as risk. However the 
Breobability of taking the first route is much less than met 
of taking the second, a result of the aggressive dividing 
tee 

The implication of the first alternative is that n- 
dimensional space has been partitioned into too many regions. 
By combining some of the regions this error can be recti- 
fied; that is, adjacent groups must be combined to reduce 
the number of regions. For definitional purposes those 
groups adjacent to group r,; will be (1) that group whose 


centroid is the least number of group r,'s standard devia- 


Eions trem group fr centroid and (2) that group 


t 
i S 
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Containing the point which is closest to any point of group 


Yr;‘s, or in other words, that group whose hypersurface is 


Closest to group r,;'s hypersurface. The standard devia- 
tion in n-dimensions is calculated solely on the basis of 
distance from the mean and is thus not a directional quan- 


tity. The sample variance is 


m 


4 1 —,2 
S = m-1 (cy gaa c) 


al 


and the standard deviation is \ s¢ ; 


where: 
m = the number of points 
Cc, = the coordinates of the ith point = ee Cinr ceeee 
Cin) 
n = the number of dimensions 
Cc = the coordinates of the mean point = (a oor 
Be. ee 
m m m 
-_i 
-2( onnle ; i aalil 3 ein | 
i=1 i=l 7) 
jms the distance of any point, cj, from the mean is 
m m 
a= e _ = Cc \ 25 (c -i . )" 
i i m il ct iS i2 
25 
, we. (ci, -2 Cin | 
i=1 





Both definitions of adjacent groups are applied in 


Pyyase 9. 


Figure 9 





In Figure 9 group fr3's centroid is only two of group rj's 
standard deviations from group r,‘s centroid and is thus 
aajacent. Group Pom 15 also adjacent sto § 1) since ace 
surface is closest to rj,'‘s. 

Hiusercer Group xr; with each of its adjacent guougs 
individually in n-dimensional space, the optimal line is 
determined and the histogram analyzed. The rule for making 
divisions becomes much more stringent than in the earlier 
phases of analysis in that partitions of space are not made 
as readily. The combination of points is separated only Ze 
there exists a point between two maxima such that the slope 
Of the curve-fitted histogram is zero, and the value of the 
Guevevalso equals zero. The reason for this Change in 
policy is that to err by making too few divisions, which 
appears to be unlikely, is seemingly more acceptable than to 


err by making too many divisions. This phase is continued 


until none of the groups will combine with any of its 
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Betgaboring groups to form one clump. A typical cembination 
process of the earlier illustrated partitioning of n- 
dimensional space into seven groups is represented by Figure 
10. Initially there are seven groups, Cy: Lor eee, ©7- 

The symbols in each box represent the group and those 
immediately below the box represent the adjacent groups of 


mene group designated in the box. 


Paige ©, is “taken with cach of its adjacent groups 
aacelt 1S found that it combines with Y5 bub not “YA. 
In the second step it is determined that ry, + Yrg will 
not combine with its adjacent group ry- When r3 is 
examined with each of its adjacent groups it is found that 


it does not unite with r. but wi iitewi th ay + ro: 


In the third step it can be seen that group r, + f9 
+ 3 will not combine with ltsadgdeent groupe) cee 
Finally it is found that r, combines with rg. In the 


fast Step, is and re are united with ro, resulting in 


three final groups, fr, + ry + 3, Ly and ©5 + re + Xz. 
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Figure 10 
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Thus far cluster analysis under this method may be 
summarized in the following steps. 

(1) Determine the optimal line using the non-linear 
programming method. 

(2) Construct a histogram representing the distribution 
of the projected points. 

(3) Analyze the histogram and form a group for each 
peak. 

(4) Repeat Steps 1, 2, and 3 for each mode and its 
associated group until none will subdivide. 


(5) Analyze the combinations of adjacent groups. 


F. REFINEMENT OF THE PROCEDURE 

As mentioned earlier, there are cases in which the line 
providing maximum variance of the points is not favorable to 
group detection. One such case involves two side-by-side 


evongaeed ellipsoids. Consider Figure ll. 
Xx 
2 


Figure 11 


x 

| 
In this instance, the points would be projected on line {, 
and only one group would be detected. Similarly, in n- 


dimensional space the same problem would be posed, that is, 
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the maximum variance line would make it possible to detect 
Only one group. It is, therefore, necessary to amend the 
procedure in a manner which can resolve the elongated ellip- 
SOid case and many of its variations. In most cases it can 
be seen that there does exist a line onto which the points 
could be projected which would result in a histogram con- 


taining two maxima and hence, the detection of two groups. 


Figure 12 shows that this is indeed the situation: 





/ x Figure 12 








In each diagram, |, represents the maximum variance line 
and J > 4 ane which facilitates the detection of tame 
groups. It is interesting to notice that in each case i 
is perpendicular to i For the spatial dispersions of 
points under consideration, that is, for elongated ellip- 
soids, this result is intuitively appealing. 

However, aS more characteristics are observed and the 
Space is extended beyond two dimensions, the number of family 
Of lines directionally perpendicular to the maximum variance 
eae; ey. increases from one in two space to infinity 
when the dimension of the space is three or more. In the 
two dimensional case it can be reasoned that there is only 
One line family of parallel lines perpendicular to line f.. 
a2 





For the purpose of analyzing point distributions and to be 
consistent with the non-linear programming solution for the 
ine, vector in theedirection of hs it is reasonable to 
specify that all lines onto which points are to be pro- 
jected must pass through the origin.  Inwessencepe it agethe 
stope of the line, not its axes intercepts, which is of 
importance. Similarly, in n-dimensional space the direction 
Or spatial orientation of the line is the only concern. 

In the following argument it can be assumed without loss 
Of generality that all lines under discussion pass through 


the origin. Also f will designate the maximum variance 


il 
line and .. will designate the line perpendicular to f ; 
which successfully handles the elongated ellipsoid. The 
problem is thus one of ascertaining the line, iy 3! OLeno— 
gonal to the maximum variance line which could better 
discriminate between areas of high point density. It is, 
therefore, desirable to find a line on which a histogram of 
the distribution of the projected points will be unimodal 
if only one group exists and contain two local maxima if 
two exist. This line, y 3! is that line which is perpen- 
dicular to ?, and on which the variance of the projected 
points is a relative maximum. That is, considering Figure 


a , 


to all other lines perpendicular to the absolute maximum 


provides a maximum variance between points relative 


variance line y 1, of which there are none. In the thee - 
dimensional analog it can again be reasoned that the line 


which will facilitate the detection of two groups, if two 
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exist, is analogous to on It 1S again that line which 
is perpendicular to the absolute maximum variance line, 
X 1! and is itself a relative maximum variance line, rela- 
tive to all those lines perpendicular to Fis As the dimen- 
Sion of the space is increased from three to n-dimensions, 
the line {, which is of interest is expected to possess 
Ehe same characteristics. 

The problem of finding such a line can be formulated as 
a non-linear programming problem of much the same struceum 
aowtne Original method. The only mutation is the ad@icizon 
Of a constraint which specifies that the unit vector in the 
direction of I, must be perpendicular to that in the 
direction of ee Hence, the dot product of the two Writ 
VEEtors is zero. 


The non-linear programming problem is thus: 


™m 


2 a —,2 
Max S =- X- - X 
=a] (x5 ) 
i=1 
subject to 
we w= ll 
wt u= 0 


Wwnere-s w is a unit vector in the direction of . 


Weis a Unit veetor in the diteeccion sos 5, 
(note that all of the coefficients of u are known at this 
point) and the remaining symbols are unchanged in inter- 


pretation from the original programming problem. 
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The problem now becomes 
m n 
2 


al 
M S —_ —— e e e an 
AX a | ( a v5] 


| 
~ 


i=l j=l 
subject to 
wewe=] 
we u = 0 


The Lagrangian is thus 


F (w, , oe coeeys Wn: ee ep = 


~ 


gs? - dr, (we w-1) - hie (wEu) 
m n m on 


=) tees, 5] : | 


i-l =u = iL, yal 


= te 

Q 

j 

eee 

= 

Lu. 

a 
Ny 


= ha (wt w-l) - h» (we Ly). 


As before the partial of the Lagrangian with respect to each 
variable can be set equal to zero to find the optional vector 
W. 


Kth 


Taking the partial with respect to the component 


of w yields 


35 





J. ee 2. we ay | - 2 wet 
i jae Ay (w* w-1) i Oy oa 


The first term is exactly the same as in the original problem 


with u replaced by w and thus the equation reduces to 


m nN 
o= ) Dik > cay Wy 7 2A, my - A €@B go 
W: 
i=1 j=l 


k 


2 Bik _ 1 
a and Bae Se Gane to = Cik, 


i=l 


where Di, 


Differentiating the last term with respect to WwW, yields only 
one term, that component of wu which is multiplied by Wye 


iRowacrivative of F with respect w, redtiees fo 
m n 
i=l j=l 


There will be n equations of this form, where n_ is the 


number of dimensions. Differentiating with respect to the 
Lagrange multipliers and setting the derivative equal to 
zero yields the constraints: 


i 


(4) wt w 


(5) we u= 0. 


There are now nt2 equations in nt+2 unknowns which can be 
solved to give the vector w in the direction of h 2° 

The additional step of the procedure is thus to find the 
ne {, for each group and observe the resulting histogram 
of the projected points on 1: In this manner the possi- 


bamsity of missing a group is reduced. 
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The termination of the elongated ellipsoid analysis 
completes the cluster analysis procedure. In summary the 
method is described in the following steps. 

(1) Determine the optimal line using the non-linear 

programming method shown previously. 

(2) Construct a histogram representing the distribution 

Of the projected points. 

(3) Examine the histogram and form a group for each 

peak. 

(4) Repeat Steps 1, 2, and 3 for each maximum and its 

associated group until none will sub-divide. 

(5) Analyze the combinations of adjacent groups. 

(6) Examine each group to insure that it cannot be 


reasonably split into two groups. 


It is to be noted that the implementation of the final 
step need not be confined to the end. For the most favor- 
able result, when the points of a region are projected on 
their absolute maximum variance line, they should also be 
projected on their relative maximum variance line. Then, if 
a division of the region is made, it should be done in 


a@eordance with the histogram on that line i 1 OF f. 


Which most clearly divides the points. 


G. INTERPRETATIONS 

In light of the earlier discussion of bell-shaped 
histograms, some interesting observations concerning the 
various groups can now be made. As noted previously, the 


more closely the distribution of points in a cluster 
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approximates a hyperball, the more well defined and bell-like 
is the curve-fitted histogram representing the distribution 
Of points when orthogonally projected on the optimal line. 
Thus, simply by observing the histogram for each group, it 

1s possible to make comparisons between groups and to derive 
useful information about the probable shape of the hyper- 
volume spanned by the points. 

Probably the most influential factor in making a judge- 
ment will revolve around some within group variance study. 
In this case, the magnitude of the variance would not be a 
sufficient comparable statistic between groups; it could 
easily occur that group rj; and group rj are equally 
dense, where density is points per unit volume, but group 
rj is larger and contains twice as many points as group 
r;- This results in group rj having» a character isitvealm 
larger variance, even though both groups may span a 
Similarly shaped hypervolume. 

One meaningful comparable statistic could be formed as 
follows: (1) calculate the standardized variance, i.e., the 
variance assuming the entire line segment for each group has 
length one and (2) form a ratio between the height of the 
mode to that variance, assuming each point in an interval 
contributes a unit of height to the curve. Observe that the 
smaller this ratio, the more elongated is the ellipse; in 
fleece in the limit as the ratio tends to zero, the spomne 


distribution approaches a straight line segment. 
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As a further observation it can be seen that the shape 
of the ellipsoid yields information concerning the original 
nh specimen variates. Knowledge that a group is an elonga- 
ted ellipsoid in conjunction with knowledge of its orienta- 
tion in space is useful in eliciting information about the 
measured quantities. The process of ascertaining the domi- 
nant axis direction, or ellipsoid orientation, is accom- 
plished in the following manner: as before, that line onto 
which the variance of the orthogonally projected points is a 
maximum can be determined. This will be that line which is 
approximately parallel to the longest axis of the ellipsoid 
and hence, the orientation is specified. Then two particu- 
larly interesting cases can occur: (1) the direction of the 
line may coincide closely with a coordinate axis, that is, 
the "cigar-shaped" distribution of points may have its 
longest axis approximately parallel to one of the axis of 
the coordinate system or (2) the distribution of points may 
be oriented such that its longest axis lies directionally 
near the forty-five degree (45°) line between two coordinate 
axes. 


Conseeer Figure 13, an illustration of the first jcasee 


Figure 13 
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Ht should be realized that an elongated ellipsoid parallel 
to an axis indicates that the observations of the sundry 
Characteristics are not varying comparably. That is, one 
dimension is exerting a relatively powerful force in the 
determination of point dispersion. Said differently, the 
observations of one characteristic are varying considerably 
while observations of the other fluctuate only slightly. 
Furthermore, an increase in one variate does not affect the 
Sener, cnus, for the n-dimensional case, the characteriageie 
represented by the coordinate axis which is parallel to the 
major axis of the ellipsoid is independent of all other 
coordinate axes. 

The second case,in which a different orientation of the 
ellipsoid is considered, will first be examined in two- 
dimensional space. Figure 14 will help to illuminate the 


ermscuss ion. 


K 


Figure 14 


Xs 

The elongated ellipsoid has its major axis running ina 
Girection near parallel to the forty-five degree line between 
the x; and x2 coordinate axes. The implication in this 
event is that there exists an interaction between the two 
measured specimen characteristics represented by x), and 


Xo, hence, the variates are not independent. As an example, 


suppose that blood pressure and age of humans were being 
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measured. Generally speaking, blood pressure rises with age 
and it can be expected that an increase in age would effect 
an increase in blood pressure. Thus, the expected distribu- 
tion of points would be in the form of an elongated ellipse 
whose major axis is positively sloping and directionally 
Close to the forty-five degree line between the representa- 
tive coordinate axes. An elongated ellipse negatively 
Sloping approximately 135° between two axes indicates a 
complimentary trade-off between specimen variates; an 
increase in one variate is characteristically accompanied by 
a decrease in the other. Extending this concept to n- 
dimensional space, it can be seen that if the orthogonal 
projection of a group in the plane formed by any two dimen- 
Sions results in a "cigar-Shaped" distribution of points 
Oriented directionally close to the forty-five degree line 
between the axes, then the two specimen variates represented 
by those axes are probably directly related. Said differ- 
ently, if the unit vector in the direction of the line which 
Maximizes the variance of the projected points of an elonga- 
ted ellipsoid, as found using the non-linear programming 
method, contains two direction cosines which are approxi- 
mately equal, then there may exist a dependence between the 
underlying variates. In addition, if it is desired to know 
more precisely the relation between two variates a linear 


regression analysis can be~performed. 


Al 














H. CAUTION 

Although the procedure presented in this paper is capable 
Of handling a wide variety of point dispersion patterns, 
there still exists many cases in which the techniques of this 


method are inadequate. For example, consider Figures 15a and 








JL ioe | ft, 
x Ml 
2 
| 
| 
[jf oe Ce eee 
I 
X ; X 
Figure 15a 1 Figure 15b 1 


In Figure 15a it is not possible to detect the smaller group 
completely surrounded by a "donut-shaped" group. The n- 
dimensional analog of this is a hyperball (solid sphere) 
contained in a hypersphere (hollow sphere). Again it can be 
seen that only one group would be detected. In Figure 15b 
!, is the maximum variance line and ‘o is pespendicula: 
to 1° It is possible that there exists precisely that 
ame@unt Of overlap between the group projections on s to 
cause the histogram to be unimodal. In that case only one 
group is detected using f 1° When line f 2 is examined 
again only one group is detected. In either case A or B it 


is clear that it is not desirable to term the entire disper-— 


eeoneot points one cluster. 


42 





Many of the rules set forth in this work are designed to 
handle a general case. The user may wish to alter some of 
these to fit his specific data, his motives for using cluster 
analysis and his desired accuracy. For example, depending 
Mmoom the Specific criteria selected the two-dimensional 
point dispersion shown in Figure 16 might be termed as one 


group or as two by the method presented in this paper. 


sp 


Figure 16 


Ky 


The reader may decide if the point dispersion is one or two 
groups by relaxing or tightening the constraint for making 


GQivisions when combining adjacent groups. 
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Lil. EXAMPLE 

A. EXPLANATION OF THE DATA 

To illustrate the operation of the analysis procedure, 
an example was chosen in which thirteen variates were 
measured on 254 specimens. The specimens are naval officers 
who attended the Naval Postgraduate School, Monterey, during 
the years 1964 through 1966. The variates are the results 
of several measurements derived from the Allport, Vernon, and 
Lindzey (AVL) and the Edwards Personal Preference Schedule 
(EPPS) psychological tests. Through the theory of psychology 
and personality, as set forth in Edward Spranger's Types of 
Men, the two above tests measure a total of twenty-one 
aspects of human nature, six in the first test and fifteen 
in the second. To avoid repetition, only thirteen of these 
Variates are used in the cluster analysis example, six from 
AVL and seven from the EPPS. 

The six scores from the AVL reflect the following aspects 
Of human make-up: (1) theoretical, which pertains to the 
desire to pursue truth, (2) economic, or the interest in 
What is useful, (3) aesthetic, which involves the interest 
in form and harmony, (4) social, or the high esteem for love 
of people and unselfishness, (5) political, or lust for 
power and (6) religious, which involves the desire for unity 
Or comprehension of the whole. 

Those variates chosen from the EPPS measure: (1) achieve- 


ment, or the ability to accomplish tasks requiring skill and 
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effort, (2) deference, which involves the acceptance or 
acknowledgement of the leadership of others, (3) affilia- 
Cion, or loyalty to friends or groups, (4) intraception, or 
the ability to analyze the behavior of others, (5) domi- 
Nance, or the facility to be a leader, (6) nurturance, which 
is giving of oneself to others who need kindness or under- 
Standing and (7) endurance, or the ability to see a job 
Eiesughi= to completion. 

It is desirable to know if the 254 naval officers cluster 
as one high point density area or as several when the above 
thirteen variates are measured on each. If there do exist 
several groups, the characteristics of each could be analyzed 
and perhaps yield valuable information to the Navy. For 
example, if most of the officers in one group were those who 
Voluntarily resigned, the AVL and the EPPS could then be 
used as a predictor to aid the Navy in determining before- 
hand those officers who would probably not make the service 
a career. Such a result would indeed have great usefulness 
within the Navy. On the other hand, if points representing 
the officers cluster into only one group, the indication is 
that the AVL and EPPS would not be successful as predictors 


for officer career patterns. 


B. PROCEDURE 

The first step in the cluster analysis method is to 
project the points on their maximum variance line. The 
Fortran IV program included at the end of this paper first 


determines the set of non-linear equations to be solved, 
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then determines their solution, which is in the form of the 
Unat vector in the direction of the maximum variance line, 
and ultimately projects all points on that line. The only 
inputs to the program necessary are the coordinates of all 
the points and the number of dimensions. Using this program 
it was found that the initial set of fourteen simultaneous 
non-linear equations to be solved when considering all 254 


points are: 


Sc 
}- 
N) 
f 
ow 
il 
© 


where 


U = (uy. Up, see, U, 3) 


and 
A is the thirteen by thirteen array of numbers 


defined on the following page. 
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ierr solution in terms of u, through U3 is 
u = (-.32, .72, -.23, -.06, -.08, -.18, -.10, -.07, -.26, 
—226, -. MG, —.17; -—223) 
This vector, uU, is the unit vector, Us in the direction 


Of the maximum variance line. The 254 points were then pro- 
jected on the line whose direction is given by the above 
unit vector. As can be seen by the first histogram in the 
appendix, the curve-fitted histogram of the projected points 
is unimodal. Therefore, the points were projected on the 
relative maximum variance line, which is perpendicular to 
the absolute maximum variance line. The curve-fitted histo- 
gram as shown in the appendix Bor Se. 75 local maxima and 
a division can be made in accordance with the division rule 
requiring the local minima to be no greater than eight- 
tenths the height of the lesser adjacent peaks. Thus, n- 
dimensional space has been partitioned into two mutually 
exclusive and collectively exhaustive regions fr),, contain- 
ing 142 points and r»5, containing the remaining 1l2 
Pemants. The next two curves indicate that the region =r) 
would not subdivide further, either when its points were 
projected on the absolute maximum variance line or the rela- 
tive maximum variance line. When region rj was considered 
it was found that a division was possible when the points 
were projected on their absolute maximum variance line. 


Region ry was, therefore, partitioned into fr5., and Yo), 
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with fifty-two points allocated to the former and sixty to 
lve: Jishere hae 

As shown in the following diagrams, it was not possible 
to make any further division of any of the regions. Further- 
more, region Y5, was found to be adjacent to C5),- me 
two regions were combined into one, r9, sinze the zero 
Slope, positive second derivative point on the curve-fitted 
histogram had value approximately eighty percent the height 
of the lesser peak. Similarly, rj, and ry were combined. 
According to this method, then, the points are one cluster. 
It is interesting to note that this result is in agreement 
With several different statistical techniques applied to 
this same data. These methods include various forms of 
regression analysis and a discriminate analysis technique 
developed at The University of California at Berkeley and 


presented by J. W. Dixon, (Ref. 2). 


Cc. CONCLUSION 

The conclusion which must be reached in accordance with 
this cluster analysis on the given data is that the AVL and 
EPPS were not shown to be valid for distinguishing among 
Naval officers with respect to the thirteen criteria 
mentioned. It should be noted, however, that the data used 
was very narrow-based; that is, it represented only a small 
portion of the officers in the Navy, namely a fraction of 
those attending graduate school. If a more representative 
sample could be obtained, it is indeed feasible that 


different results would be obtained. 
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IV. APPENDIX A: 


HISTOGRAM ANALYSIS 


This appendix contains ten histograms and their curve- 
fitted approximations. On each diagram is the unit vector 
Specifying the line onto which the points were projected. 

In those cases where a division was made the division line 
is shown. 

Following the histograms is a computer program which 
yields sufficient data to facilitate a histogram analysis. 
ties program first determines the set of non-linear eC@udeven— 
Eeeowe SOlved for the unit vector, it secondly determanes 
their solution, and thirdly projects the points onto the 
line specified by the unit vector. Thus, for each histogram 
shown, a set of non-linear equations was found and solved 
for the unit vector and the points were subsequently pro- 


jected on the specified line by the computer program. 
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